Using crowd-sourcing for query classification and analysis
نویسنده
چکیده
In order to gain a better understanding of users and their intent behind web searching activities, first steps involve the analysis of the query submitted by the user and correct categorical classification of the query as a input for further analyses. Natural Language Processing (NLP) is an area with many inaccuracies for problem areas in Word Sense Disambiguation (WSD) and terms detected that are beyond the scope of the verification source[16]. Other difficulties include named entity recognition where capitalisation and language anomalies cause issue. These classic information retrieval issues are all relevant to query classification and this proposal will outline another approach to classification that uses the intelligence and judgement of people to counter the aforementioned challenges. There is clearly potential to resolve tasks with crowd-sourcing that require complex judgement and have many input factors for consideration, but with web search queries, the subjectivity is less prominent depending on the query type. The work outlined here describes experimentation with the use of Amazon Mechanical Turk (MTurk) as a classification tool. 1200 queries have been carefully selected and an experiment will be set up in MTurk to classify those queries into categories derived from various sources. Once a given query has been classified (according to a minimum inter-ranked consistency for that query), the query will subsequently be subjected to classification into sub-categories of its confirmed, higher level category, eventually resulting in its removal from the system when its classification is definitive. This is a subset of work contributing to the study in personalisation of web search activities and is a preliminary stage of understanding user intent that is required for further advancement into evaluating current personalisation techniques and developing new or modified means of best delivering search results based on various personalised factors.
منابع مشابه
A Hybrid Machine-Crowd Approach to Photo Retrieval Result Diversification
In this paper we address the issue of optimizing the actual social photo retrieval technology in terms of users’ requirements. Typical users are interested in taking possession of accurately relevant-to-the-query and non-redundant images so they can build a correct exhaustive perception over the query. We propose to tackle this issue by combining two approaches previously considered nonoverlapp...
متن کاملInterPoll: Crowd-Sourced Internet Polls
Crowd-sourcing is increasingly being used to provide answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little to help the survey-maker or pollster in obtaining statistically significant results devoid of even the obvious selection biases. This paper proposes InterPoll, a platform fo...
متن کاملPushing the Boundaries of Crowd-enabled Databases with Query-driven Schema Expansion
By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, we extend crowd-enabled databases by flexible query-driven schema expansion, allowi...
متن کاملSkyline Queries over Incomplete Data - Cost Models
Skyline queries are a well-known technique for explorative retrieval, multi-objective optimization problems, and personalization tasks in databases. They are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skyline query processing is severely hampered and often has to resort to error-prone heuristics. Unfortunately, incomplete d...
متن کاملA Tale of Two Crowds: Public Engagement in Plankton Classification
“Big data” are becoming common in biological oceanography with the advent of sampling technologies that can generate multiple, high-frequency data streams. Given the need for “big” data in ocean health assessments and ecosystem management, identifying and implementing robust, and efficient processing approaches is a challenge for marine scientists. Using a large plankton imagery data set, we pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012